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Abstract 

For maximizing influence spread in a social network, given a certain budget on 
the number of seed nodes, we investigate the effects of selecting and activating the 
seed nodes in multiple phases. In particular, we formulate an appropriate objective 
function for two-phase influence maximization under the independent cascade model, 
investigate its properties, and propose algorithms for determining the seed nodes in 
the two phases. We also study the problem of determining an optimal budget-split and 
delay between the two phases. 
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1 Introduction 

Social networks play a fundamental role in the spread of influence on a large scale; this is 
harnessed by companies for viral marketing. The problem of influence maximization deals 
with selecting k seed nodes where the diffusion should be triggered, so as to maximize 
the influence when diffusion concludes; we call k as the budget. This problem has been 
extensively studied in the literature [5j, including that of AAA1AS The basic idea of 

using multiple phases for maximizing an objective function has been presented in [T], To 
the best of our knowledge, ours is the first detailed effort to study multi-phase diffusion in 
social networks. 

An advantage of multi-phase diffusion is that the seed nodes in any phase, except the first 
one, can be chosen based on the spread observed so far, thus having more certainty during 
seed selection. But owing to delayed seed selection, the diffusion may be slower, leading to 
compromise of time. 

* Please cite the original publication that will be appearing in the Proceedings of The 14th International 
Conference on Autonomous Agents & Multiagent Systems, 2015. This work is funded by Adobe Research 
Labs, Bangalore, India. The first and second authors are supported by IBM and TCS Doctoral Fellowships, 
respectively. The authors thank Surabhi Akotiya for the useful discussions. 
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2 Problem Formulation 


As a starting point, we focus on two-phase diffusion. Given a graph G, we consider Inde¬ 
pendent Cascade (IC) model where, p uv is the probability with which node u can influence 
v. Let X be a live graph (got by independently sampling edges in G) and p(X) be the 
probability of its occurrence. Let & X (S) be the number of nodes reachable from set S in A" 
(so expected number of influenced nodes at the end of single phase diffusion with seed set S 
is a (S) = E x P( X ) aX ( S ))- 

At the beginning (time 0), let k\ seed nodes be selected for first phase and after delay 
d, k 2 (< k — fci) for second phase. We aim to maximize the expected influence at the end 
of two-phase diffusion. For now, assume k \, k 2l d to be given; our objective is to determine 
seeds for the two phases. 

Let Si be the seed set for first phase and X be the destined live graph (unknown at time 
0). Let Y be the observed diffusion at time d, which gives *4? and 1Z Y , the sets of already 
and recently influenced nodes, respectively. At time d, given that nodes in VA effectively 
are seeds for second phase (as per IC model), we aim to select an additional seed set S 2 (Y ’ k ^ 
of size k 2 , that maximizes the final influence. We obtain gO( x ,Si,d,k 2 ) s j nce y is unique for a 
particular (A, S' 1; d). So our objective is to find Si that maximizes 


s(Si) = Jjp(y){l^’'l + (n Y u sr«)} 

Y X 

= p(X)a x (S 1 U S° {x ’ sudM) ) 

x 

Note that the choice of S 2 ^' k2 ^ = AJ ' 2) depends not just on A", but on Y, and 

hence on all live graphs that could result from Y (like in single phase, choice of the best seed 
set depends on all live graphs that could result from G). NP-hardness of maximizing g (•) is 
clear. It can be shown that, for fixed k 2 and d, g(-) is non-negative and monotone increasing 
(note that with k 2 and d as variables, g(-) is not monotone), but it is neither submodular 
nor supermodular. However, it was observed using simulations on the test graphs, that 
the diminishing marginal returns property (characteristic of submodular functions) holds in 
most cases. 

An example for computing g(-): A graph with {A, B, C, D} as nodes, pab = 0.5 ,Pbc — 
0.8, pbd = 0.9. Consider Si = {A}, k 2 — 1, d — 1. Table [I] lists the two possibilities of Y 
(gO(y,fc 2 ) eag y com p U t e ). We g e t g({A}) = 3.80. 

Since it is impractical to compute ,d,fc 2 ), cons i c t er f(Si) = ExP(^) aX (‘^i ^ 

gG(x,Si,d,k 2 )^ w j iere gG(x,Si,d,k 2 ) - g a ge t Q f s j ze ^ obtained using greedy algorithm. It can be 
shown that /(•) gives a (l — ^ — e) approximation to g(-), where e is small for large number 
of Monte-Carlo iterations while computing /(•). Since greedy algorithm is not scalable, con¬ 
sider h(Si) = ^ x j}(X)a x (5i U S 2 WAi,d,fe 2 )), w j iere gY^X’S 1 ’ d to) - g a ge t 0 f s j ze ^ obtained 
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Si = {A}, k 2 = l,d = 1 


X 

p( X) 

Y 

gOfXA) 

g(Si) 

A Y 

n Y 

{AB, BC, BD} 

0.36 




4 

{AB, BC} 

0.04 


{B} 

{C} 

3 

{AB, BD} 

0.09 

4 

{AB} 

0.01 




3 

{BC, BD} 

0.36 




4 

{BC} 

0.04 


{} 

{B} 

3 

{BD} 

0.09 

3 

{} 

0.01 




2 


Tabic 1: Tabic for the example 


using generalized degree discount heuristic (GDD). GDD can be developed based on the ar¬ 
gument for Theorem 2 in [2]: until the budget is exhausted, iteratively select a node v having 

the largest value of (IXre*(l — Pxv)) ^1 + J2 y eyPvyj j where X = in-neighbors of v already 
selected as seeds and y = out-neighbors of v not yet selected as seeds. Using simulations, 
we observed for almost all S, T pairs, that: 

(a) f(T) > f(S) ==>• h(T) > h(S), critical for set selection, 

(b) ~ y, critical for algorithms that depend on ratios of function values given by sets, 
e.g., fully adaptive cross entropy algorithm (FACE) with weighted update rule [3]. 

We now present a general algorithm for two-phase influence maximization. Let J r i(-) 
and T 2 (-) be objective functions for the first and second phases, respectively. Consider an 
algorithm A for single phase influence maximization. 

Algorithm 1 Two-phase general algorithm (IC model) 

Input: G, k \, k 2 , d 

1: First phase: Find set of size k\ using A for maximizing T\ (•) on G, and run the IC 
model until time d 

2: Second phase: At time d, construct G d from G by deleting A Y ; assuming V} forms a 
partial seed set, find set of size k 2 using A for maximizing J 7 ^-) on G d 

We explore two special cases (note that if A does not compute the expected spread, the two 
cases are identical): 

1. Farsighted : ^(Si) = h(S\) , T 2 (S 2 ) = o{lZ y U S 2 ) 

2. Myopic : Ji(S'i) = a{S x ) , ^ 2 (^ 2 ) = a{ll Y U S 2 ) 
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Figure 1: (a) Typical progression of diffusion for k — 6 with different <ki,d> pairs 

{k 2 = 6 — ki) on Les Miserables dataset (WC model), (b) Typical observation of splitting 
budget k = 200 (with optimal delay) for different <5’s on High Energy Physics - Theory col¬ 
laboration network (WC model) 

3 Experimental Findings 

For studying diffusion using IC, we explore weighted, cascade (WC) and trivalency models 
P]. Plots such as the ones in Figure [l]( a), may help decide the ideal values of k\ and d 
based on the desired transient dynamics. To capture the rate of diffusion, we generalize cr(-) 
to YlfrLo r(f)cr^(-), w h er e T(-) < 1 is non-increasing, and cd^(-) is the expected number of 
recently influenced nodes at time t. We consider T(t) = 5 t , 6 G [0,1] in our experiments. We 
discover FACE [3] to be an effective method for concurrently optimizing over k\ , d, S i, by 
allowing each data sample to consist of a value of k\ sampled from {1,..., k}, a value of d 
sampled from {1,..., D} ( D is some large delay after which, diffusion is guaranteed to stop), 
and a sampled set S i of size k\. 

For 5 = 1, we observe that d = D (clearly) and ki pc k 2 give best results (Figure [jjb)), 
a reason being the trade-off between (i) the size of the observed diffusion and (ii) the ex¬ 
ploitation based on the observed diffusion. For most values of k , the gain of two-phase 
diffusion over single phase one is 5-10% for algorithms such as greedy, PMIA pTj, FACE |3], 
and GDD, in absence of temporal constraints. This gain is significant when the concern is 
monetary profits or a long-term customer base. Also, myopic algorithms perform at par with 
farsighted, while running a lot quicker (for greedy and FACE). We conclude: (a) under strict 
temporal constraints, use single-phase diffusion, (b) under moderate temporal constraints, 
use two-phase diffusion with a short delay while allocating most of the budget to the first 
phase, (c) in absence of temporal constraints, use two-phase diffusion with a long enough 
delay with almost equal budget for the two phases. 


4 














4 Future work 


There is a need for scalable algorithms that concurrently optimize over k \, d, S i (perhaps 
exploiting unimodal nature of plots in Figure [ljb) ). We considered a naive, strict (expo¬ 
nential) decay function, which humbled two-phase diffusion for most <5’s; a more realistic 
function needs to be studied. One could study how multi-phase diffusion can be used to 
achieve a desired spread with a reduced budget. 
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